Optimizing substitution matrices by separating score distributions
نویسندگان
چکیده
MOTIVATION Homology search is one of the most fundamental tools in Bioinformatics. Typical alignment algorithms use substitution matrices and gap costs. Thus, the improvement of substitution matrices increases accuracy of homology searches. Generally, substitution matrices are derived from aligned sequences whose relationships are known, and gap costs are determined by trial and error. To discriminate relationships more clearly, we are encouraged to optimize the substitution matrices from statistical viewpoints using both positive and negative examples utilizing Bayesian decision theory. RESULTS Using Cluster of Orthologous Group (COG) database, we optimized substitution matrices. The classification accuracy of the obtained matrix is better than that of conventional substitution matrices to COG database. It also achieves good performance in classifying with other databases.
منابع مشابه
Amino acid similarity matrices based on force fields
MOTIVATION We propose a general method for deriving amino acid substitution matrices from low resolution force fields. Unlike current popular methods, the approach does not rely on evolutionary arguments or alignment of sequences or structures. Instead, residues are computationally mutated and their contribution to the total energy/score is collected. The average of these values over each posit...
متن کاملInformation Covariance Matrices for Multivariate Burr III and Logistic Distributions
Main result of this paper is to derive the exact analytical expressions of information and covariance matrices for multivariate Burr III and logistic distributions. These distributions arise as tractable parametric models in price and income distributions, reliability, economics, Human population, some biological organisms to model agricultural population data and survival data. We showed that ...
متن کاملBlind source separation of convolved sources by joint approximate diagonalization of cross-spectral density matrices
In this paper we present a new method for separating non-stationary sources from their convolutive mixtures based on approximate joint diagonalizing of the observed signals’ cross-spectral density matrices. Several blind source separation (BSS) algorithms have been proposed which use approximate joint diagonalization of a set of scalar matrices to estimate the instantaneous mixing matrix. We ex...
متن کاملMixtures : A Method for Improving Detection of
This paper presents the mathematical foundations of Dirichlet mixtures, which have been used to improve database search results for homologous sequences, when a variable number of sequences from a protein family or domain are known. We present a method for condensing the information in a protein database into a mixture of Dirichlet densities. These mixtures are designed to be combined with obse...
متن کاملDirichlet Mixtures : A Method for Improving Detection of
This paper presents the mathematical foundations of Dirichlet mixtures, which have been used to improve database search results for homologous sequences, when a variable number of sequences from a protein family or domain are known. We present a method for condensing the information in a protein database into a mixture of Dirichlet densities. These mixtures are designed to be combined with obse...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Bioinformatics
دوره 20 6 شماره
صفحات -
تاریخ انتشار 2004